Goto

Collaborating Authors

 call path


Interpreting Performance Profiles with Deep Learning

Liu, Zhuoran

arXiv.org Artificial Intelligence

Profiling tools (also known as profilers) play an important role in understanding program performance at runtime, such as hotspots, bottlenecks, and inefficiencies. While profilers have been proven to be useful, they give extra burden to software engineers. Software engineers, as the users, are responsible to interpret the complex performance data and identify actionable optimization in program source code. However, it can be challenging for users to associate inefficiencies with the program semantics, especially if the users are not the authors of the code, which limits the applicability of profilers. In this thesis, we explore a new direction to combine performance profiles and program semantics with a deep learning approach. The key idea is to glean code summary for semantic information (at a certain level) and integrate it into a profiler, which can better understand program inefficiencies for actionable optimization. To be concrete, we combine profiles generated by Async Profiler (the state-of-the-art Java profiler) with code summarization from a fine-tuned CodeBERT-based model. We demonstrate the code summaries of any selected call path in a graphic user interface. Our system can effectively assist analysis on many Java benchmarks.


DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads

Zhao, Qidong, Wu, Hao, Hao, Yuming, Ye, Zilingfeng, Li, Jiajia, Liu, Xu, Zhou, Keren

arXiv.org Artificial Intelligence

Effective performance profiling and analysis are essential for optimizing training and inference of deep learning models, especially given the growing complexity of heterogeneous computing environments. However, existing tools often lack the capability to provide comprehensive program context information and performance optimization insights for sophisticated interactions between CPUs and GPUs. This paper introduces DeepContext, a novel profiler that links program contexts across high-level Python code, deep learning frameworks, underlying libraries written in C/C++, as well as device code executed on GPUs. DeepContext incorporates measurements of both coarse- and fine-grained performance metrics for major deep learning frameworks, such as PyTorch and JAX, and is compatible with GPUs from both Nvidia and AMD, as well as various CPU architectures, including x86 and ARM. In addition, DeepContext integrates a novel GUI that allows users to quickly identify hotpots and an innovative automated performance analyzer that suggests users with potential optimizations based on performance metrics and program context. Through detailed use cases, we demonstrate how DeepContext can help users identify and analyze performance issues to enable quick and effective optimization of deep learning workloads. We believe Deep Context is a valuable tool for users seeking to optimize complex deep learning workflows across multiple compute environments.


AI Empowered Net-RCA for 6G

Qiu, Chengbo, Yang, Kai, Wang, Ji, Zhao, Shenjie

arXiv.org Artificial Intelligence

In order to realize the vision of connecting everything worldwide, the sixthgeneration (6G) wireless networks are receiving unprecedented attention, and are anticipated to build a bridge to the smart society of the future. Compared with 5G, 6G is expected to boost network spectrum efficiency, offer massive access, improved reliability, and latency, as shown in Table 1. Consequently, future 6G networks are expected to be able to support wireless connections of various emerging applications and massive intelligent devices e.g., extended reality (XR) services, telemedicine and brain-computer interfaces, and deliver low latency and high data rates for different heterogeneous devices. We illustrate the key emerging 6G applications in Fig.1. Specifically, the application types of 6G can be classified as MBRLLC, mURLLC, HCS and MPS [8]. The most representative 6G use cases are presented as follow, the network requirements are shown in Table 2. Holographic Type Communication (HTC): HTC is expected to deliver 3D images from one or multiple source nodes to different destinations. Owing to the extremely large data for recording and reconstructing, HTC requires bandwidth up to Tbps level for transmission.